To alleviate the scarcity of data, this paper collects and annotates two infrared tracking datasets for sequential small target detection, named ATR-ISTD and UAV-ISTD. This paper proposes a sequential small target detection network integrating a memory pool, which effectively utilizes the correlation information between frames before and after, reads memory information through memory matching between the query frame and the memory frame, and solves the problems of high false alarm and low accuracy in infrared small target detection under high clutter background. To reduce the loss of small target features caused by downsampling, a forward semantic guided fusion module(PSGF)is designed to integrate features of different scales. In the memory vector encoder, a pseudo label guided feature enhancement module(PLG-FE)is designed to enhance the local feature expression ability of small targets. Experimental results show that, compared with mainstream single-frame detection methods, the proposed method significantly reduces false alarm rates, achieving improvements of 16.87% and 10.49% on the ATR-ISTD and UAV-ISTD datasets, respectively. Target-level F1 scores increased by 4.89% and 6.54%, and pixel-level F1 scores improved by 7.69% and 11.63%.
To address the significant decline in positioning accuracy of traditional algorithms under indoor non line of sight(NLOS)conditions and low beacon deployment density, this paper proposes a fused positioning method based on ranging correction, combining bluetooth low energy(BLE)and pedestrian dead reckoning(PDR). Firstly, the received signal strength(RSS)of BLE is rapidly constructed using SketchUp indoor 3D modeling software integrated with a ray-tracing algorithm, eliminating the need for tedious manual RSS field collection. Subsequently, a variational autoencoder based on convolutional neural network(VAE-CNN)is designed to predict and correct BLE ranging errors, thereby improving BLE positioning accuracy. Finally, an extended Kalman filter(EKF)is employed to fuse the positioning results from BLE and PDR. Experimental results demonstrate that the proposed ranging-corrected BLE positioning and EKF-based fusion positioning achieve superior performance in environments with NLOS interference and low beacon deployment density.
The integration of mobile edge computing(MEC)and wireless power transfer(WPT)can effectively alleviate the constraints of limited computing resources and battery capacity in wireless devices. To address the dynamic energy-efficient offloading problem in wireless-powered MEC systems under a nonlinear energy harvesting model, this paper proposes an energy consumption optimization algorithm based on Lyapunov optimization theory. By jointly optimizing the server's computing frequency, the energy station's transmit power, task offloading time, device transmit power, and local computing frequency, the algorithm minimizes the system's long-term average energy consumption while ensuring system stability. The stochastic optimization problem is transformed into a time-slot-based deterministic subproblem using Lyapunov optimization and solved through the Lagrange multiplier method and an improved whale optimization algorithm. Simulation results show that, compared with benchmark schemes, the proposed offloading strategy significantly reduces system energy consumption while maintaining long-term task queue stability.
The localization and recognition of key symbols in engineering drawings have long been essential applications in computer vision. Compared with traditional methods, deep learning-based text detection approaches offer higher detection efficiency and accuracy. It is therefore necessary to apply existing text detection algorithms to engineering drawing recognition tasks. This paper proposes a deep learning-based method for the localization and recognition of key symbols in engineering drawings, focusing on the detection and recognition of index symbols and dimension symbols. For index symbol localization, the drawings are cropped to a uniform size, and non-maximum suppression is used to remove redundant candidate boxes. For dimension symbol localization, a complete detection is performed on the masked drawings, and the intersection-over-union between each detected box and index symbol location is calculated to filter out partial data. Experimental results demonstrate that the proposed method achieves high precision and recall in both the localization and recognition of index and dimension symbols in engineering drawings.
To address the computational bottlenecks faced by classical neural networks under the explosive growth of data scale, quantum convolutional neural networks(QCNNs)based on quantum computing have become a research hotspot. This study constructs a QCNN for image classification within the limited resources provided by noisy intermediate-scale quantum(NISQ)devices. The model employs angle encoding and designs a convolutional layer based on a data re-uploading classifier, followed by a four-qubit pooling layer. Two different architectures of quantum fully connected layers are designed to perform image classification, and the impact of their structures on QCNN classification performance is analyzed.Simulation results show that the proposed QCNN achieves high classification accuracy and good generalization in binary classification tasks, with a maximum accuracy of 100.00%, a minimum of 94.55%, and an average of 97.29%. Furthermore, increasing the circuit depth improves model performance, enabling the QCNN to achieve over 90% accuracy in fourclass classification tasks.
To enhance the performance of 4-DoF grasp detection, this paper improves the grasp representation and proposes a depth-guided multi-scale grasp detection framework(DGM-Grasp)for robotic manipulators. Built upon an encoder-decoder architecture, the framework integrates a multi-scale cross-spatial attention down-sampling module to better focus on grasp-relevant features. To extract semantic information at different scales, a progressive multi-scale feature fusion and decoding module is designed. In addition, a depth-guided grasp filtering module is introduced to address collision problems during the grasping process. Experimental results show that DGM-Grasp achieves accuracies of 98.6% and 95.25% on the Cornell and Jacquard single-object datasets, respectively, while reducing detection time to 21 ms. The method also performs effectively on multi-object datasets, achieving a 96% success rate in ablation and real-world grasping experiments. These results demonstrate the superior generalization ability and performance of DGM-Grasp.
Temporal knowledge graph reasoning, which predicts events absent from the graph, has seen significant applications in recommendation systems, question answering, and healthcare. The lack of background knowledge in temporal knowledge graphs hinders reasoning, with existing methods relying on external graphs while overlooking implicit data within the graph. To fully exploit the graph's implicit background information, this paper extracts cross-temporal features to define entity backgrounds and proposes a temporal knowledge graph reasoning model incorporating cross-time commonality features(TR-CTC). TR-CTC uses a graph neural network to extract cross-temporal commonality from multi-hop paths, integrating it as background information into the graph representation learning process, enhancing reasoning performance. Experimental results show that TR-CTC generally outperforms baseline models in link prediction tasks.
A long short-term memory(LSTM)neural network edge computing accelerator based on distributed systolic array architecture was proposed on the resource limited edge computing devices. The design distributes input data storage to reduce data movement and power consumption, while data transmission in a systolic manner minimizes the idle rate of computing units and enhances computational efficiency. Experimental validation on a VU13P field-programmable gate array(FPGA)shows that the proposed LSTM accelerator achieves an effective computing power of 179.2 GOPS at an operating frequency of 200 MHz, with a dynamic power consumption of 0.343 W and an energy efficiency of 522.4 GOPS/W. Compared with typical existing designs, the proposed accelerator improves energy efficiency by more than 34%.
Generative text summarization models can produce novel expressions in summaries, but even the most advanced models may generate content that contradicts the source text or lacks factual verifiability—a phenomenon known as hallucination. To address this issue, this paper proposes an intrinsic hallucination optimization method to improve the summarization generation process. The proposed approach mitigates hallucinations from three perspectives: data-level optimization, model training-level optimization, and summary generation strategy-level optimization. Experiments conducted on two benchmark datasets demonstrate the superior performance of the proposed method. Compared with baseline models, the proposed approach achieves an average improvement of 8.58% in R-1 score on the CNNDM dataset and 7.26% on the XSUM dataset. The results indicate that the method not only enhances summary quality but also effectively reduces hallucinations, providing a valuable reference for the practical deployment of generative text summarization models.
To address the security challenges of relay communication in complex environments with potential eavesdroppers, this paper proposes a multi-UAV-assisted relay communication network that provides secure communication services for users. A multi-agent deep reinforcement learning(MARL)algorithm based on the Q-mixing network(QMIX)is employed to jointly optimize UAV trajectories and power allocation. The goal is to guarantee the minimum transmission rate of low-security-sensitivity users(secondary users)while enhancing the communication security and data rate of high-security-sensitivity users(primary users). Simulation results demonstrate that, compared with the Double Deep Q-Network(Double DQN)and the Dueling Deep Q-Network(Dueling DQN), the proposed algorithm improves the cumulative reward by approximately 15.5% and 1.26%, respectively. Moreover, the proposed rate-splitting multiple access(RSMA)technique significantly outperforms space-division multiple access(SDMA)and non-orthogonal multiple access(NOMA)in terms of overall system performance and information security. The proposed method provides an effective solution for achieving secure and efficient communication in multi-user wireless networks.